Search CORE

17 research outputs found

Polynomial Time Approximation Schemes for Clustering in Low Highway Dimension Graphs

Author: Feldmann Andreas Emil
Saulpic David
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

We study clustering problems such as k-Median, k-Means, and Facility Location in graphs of low highway dimension, which is a graph parameter modeling transportation networks. It was previously shown that approximation schemes for these problems exist, which either run in quasi-polynomial time (assuming constant highway dimension) [Feldmann et al. SICOMP 2018] or run in FPT time (parameterized by the number of clusters k, the highway dimension, and the approximation factor) [Becker et al. ESA 2018, Braverman et al. 2020]. In this paper we show that a polynomial-time approximation scheme (PTAS) exists (assuming constant highway dimension). We also show that the considered problems are NP-hard on graphs of highway dimension 1

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

White Rose Research Online

Experimental Evaluation of Fully Dynamic k-Means via Coresets

Author: Henzinger Monika
Saulpic David
Sidl Leonhard
Publication venue
Publication date: 27/10/2023
Field of study

For a set of points in

\mathbb{R}^d

, the Euclidean

k

-means problems consists of finding

k

centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.Comment: Accepted at ALENEX 2

arXiv.org e-Print Archive

A Quasi-Polynomial-Time Approximation Scheme for Vehicle Routing on Planar and Bounded-Genus Graphs

Author: Becker Amariah
Klein Philip N.
Saulpic David
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

The Capacitated Vehicle Routing problem is a generalization of the Traveling Salesman problem in which a set of clients must be visited by a collection of capacitated tours. Each tour can visit at most Q clients and must start and end at a specified depot. We present the first approximation scheme for Capacitated Vehicle Routing for non-Euclidean metrics. Specifically we give a quasi-polynomial-time approximation scheme for Capacitated Vehicle Routing with fixed capacities on planar graphs. We also show how this result can be extended to bounded-genus graphs and polylogarithmic capacities, as well as to variations of the problem that include multiple depots and charging penalties for unvisited clients

Dagstuhl Research Online Publication Server

Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

Author: Cohen-Addad Vincent
Saulpic David
Schwiegelshohn Chris
Publication venue
Publication date: 06/10/2023
Field of study

In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic

k

-median and

k

-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension

d

, the precision parameter

\varepsilon^{-1}

k

. Furthermore, there is no coreset construction that succeeds with probability

1-1/n

and whose size does not depend on the number of input points,

n

. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are

\Omega(1)

for both

k

-median and

k

-means, even when allowing a complexity FPT in the number of clusters

k

. This stands in sharp contrast with the

(1+\varepsilon)

-approximation achievable in that case, when allowing randomization. In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We also construct a deterministic algorithm for computing

(1+\varepsilon)

-approximation to

k

-median and

k

-means in high dimensional Euclidean spaces in time

2^{k^2/\varepsilon^{O(1)}} poly(nd)

, close to the best randomized complexity. Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS '22] by a factor

k

.Comment: FOCS 2023. Abstract reduced for arxiv requirement

arXiv.org e-Print Archive

Near-linear time approximations schemes for clustering in doubling metrics

Author: Cohen-Addad Vincent
Feldmann Andreas,
Saulpic David
Publication venue: HAL CCSD
Publication date: 09/11/2019
Field of study

International audienceWe consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of constant doubling dimension. We give the first nearly linear-time approximation schemes for each problem, making a significant improvement over the state-of-the-art algorithms. Moreover, we show how to extend the techniques used to get the first efficient approximation schemes for the problems of prize-collecting k-Medians and k-Means, and efficient bicriteria approximation schemes for k-Medians with outliers, k-Means with outliers and k-Center

HAL Descartes

White Rose Research Online

Hal-Diderot

A New Coreset Framework for Clustering

Author: Cohen-Addad Vincent
Saulpic David
Schwiegelshohn Chris
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/12/2021
Field of study

Given a metric space, the

(k,z)

-clustering problem consists of finding

k

centers such that the sum of the of distances raised to the power

z

of every point to its closest center is minimized. This encapsulates the famous

k

-median (

z=1

) and

k

-means (

z=2

) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as \emph{coresets}, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases

arXiv.org e-Print Archive

Differential Privacy for Clustering Under Continual Observation

Author: Henzinger Monika
la Tour Max Dupré
Saulpic David
Publication venue
Publication date: 07/07/2023
Field of study

We consider the problem of clustering privately a dataset in

\mathbb{R}^d

that undergoes both insertion and deletion of points. Specifically, we give an

\varepsilon

-differentially private clustering mechanism for the

k

-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number

T

of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for

k

-means. We also partially extend our results to the

k

-median problem

arXiv.org e-Print Archive

Fully Dynamic Consistent Facility Location

Author: Chris Schwiegelshohn
David Saulpic
Niklas Oskar D. Hjuler
Nikos Parotsidis
Vincent Cohen-Addad
Publication venue: H. Wallach and H. Larochelle and A. Beygelzimer and F. d'Alch'e-Buc and E. Fox and R. Garnett
Publication date: 01/01/2019
Field of study

We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time

HAL Descartes

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Archivio della ricerca- Università di Roma La Sapienza

Hal-Diderot